179 research outputs found

    Knowledge Organization Research in the last two decades: 1988-2008

    Get PDF
    We apply an automatic topic mapping system to records of publications in knowledge organization published between 1988-2008. The data was collected from journals publishing articles in the KO field from Web of Science database (WoS). The results showed that while topics in the first decade (1988-1997) were more traditional, the second decade (1998-2008) was marked by a more technological orientation and by the appearance of more specialized topics driven by the pervasiveness of the Web environment

    The landscape of Information Science: 1996-2008

    Get PDF
    International audienceWe propose a methodology combining symbolic and numeric information to map the structure of research in Information Science between 1996-2008. The visualization of the resulting maps showed that while the two-camp structure of Information Science observed in previous studies is still valid, other research poles like web and user-oriented studies are building bridges between the two hitherto isolated poles

    Combining Language Models with NLP and Interactive Query Expansion.

    Get PDF
    International audienceFollowing our previous participation in INEX 2008 Ad-hoc track, we continue to address both standard and focused retrieval tasks based on comprehensible language models and interactive query expansion (IQE). Query topics are expanded using an initial set of Multiword Terms (MWTs) selected from top n ranked documents. In this experiment, we extract MWTs from article titles, narrative field and automatically generated summaries. We combined the initial set of MWTs obtained in an IQE process with automatic query expansion (AQE) using language models and smoothing mechanism. We chose as baseline the Indri IR engine based on the language model using Dirichlet smoothing. We also compare the performance of bag of word approaches (TFIDF and BM25) to search strategies elaborated using language model and query expansion (QE). The experiment is carried out on all INEX 2009 Ad-hoc tasks

    Text mining without document context

    Get PDF
    We consider a challenging clustering task: the clustering of muti-word terms without document co-occurrence information in order to form coherent groups of topics. For this task, we developed a methodology taking as input multi-word terms and lexico-syntactic relations between them. Our clustering algorithm, named CPCL is implemented in the TermWatch system. We compared CPCL to other existing clustering algorithms, namely hierarchical and partitioning (k-means, k-medoids). This out-of-context clustering task led us to adapt multi-word term representation for statistical methods and also to refine an existing cluster evaluation metric, the editing distance in order to evaluate the methods. Evaluation was carried out on a list of multi-word terms from the genomic field which comes with a hand built taxonomy. Results showed that while k-means and k-medoids obtained good scores on the editing distance, they were very sensitive to term length. CPCL on the other hand obtained a better cluster homogeneity score and was less sensitive to term length. Also, CPCL showed good adaptability for handling very large and sparse matrices

    Annotation of Scientific Summaries for Information Retrieval.

    Get PDF
    International audienceWe present a methodology combining surface NLP and Machine Learning techniques for ranking asbtracts and generating summaries based on annotated corpora. The corpora were annotated with meta-semantic tags indicating the category of information a sentence is bearing (objective, findings, newthing, hypothesis, conclusion, future work, related work). The annotated corpus is fed into an automatic summarizer for query-oriented abstract ranking and multi- abstract summarization. To adapt the summarizer to these two tasks, two novel weighting functions were devised in order to take into account the distribution of the tags in the corpus. Results, although still preliminary, are encouraging us to pursue this line of work and find better ways of building IR systems that can take into account semantic annotations in a corpus

    Decomposition of terminology graphs for domain knowledge acquisition.

    Get PDF
    International audienceWe propose a graph decomposition algorithm for analyzing the structure of complex graph networks. After multi-word term extraction, we apply techniques from text mining and visual analytics in a novel way by integrating symbolic and numeric information to build clusters of domain topics. Terms are clustered based on surface linguistic variations and clusters are inserted in an association network based on their intersection with documents. The graph is then decomposed based on atom graph structure into central (non-decomposable) atom and peripheral atoms. The whole process is applied to publications from the Sloan Digital Sky Survey (SDSS) project in the Astronomy field. The mapping obtained was evaluated by a domain expert and appeared to have captured interesting conceptual relations between different domain topics

    SDOC et TermWatch : deux méthodes complémentaires de cartographie de thèmes

    No full text
    Le but de cette communication est de comparer deux méthodes initialement destinées à la veille scientifique et technique dans une application de fouille de textes. Les deux méthodes proposent à l'utilisateur de visualiser les résultats d'une classification hiérarchique non supervisée de données textuelles sous forme d'une carte thématique. Elles sont cependant complémentaires puisque l'une, SDOC, est fondé sur l'analyse de la matrice de co-occurences et positionne les classes (clusters) sur le plan en fonction de leurs propriétés structurelles, tandis que l'autre, TermWatch, classifie les termes en fonction de leurs seuls liens de variation syntaxique et présente les résultats sous forme d'un réseau visualisable avec le logiciel AiSee, dont les liens sont d'autant plus resserrés que les classes sont supposées être thématiquement proches

    Identifying Thematic Variations in SDSS research.

    No full text
    International audienceThe Sloan Digital Sky Survey (SDSS) is the largest ongoing sky survey. It regularly makes data releases to the astronomical community. From a macroscopic point of view, a profound question is: what is the role of SDSS data releases in the evolution of the relevant scientific fields? In this paper, we introduce an integrated approach by combining statistical, information-theoretical, and symbolic methods for text data analysis and show how this combined approach can distinguish thematic variations associated with the different data releases
    • …
    corecore